1,528 research outputs found

    Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

    Get PDF
    A primary component of next-generation sequencing analysis is to align short reads to a reference genome, with each read aligned independently. However, reads that observe the same non-reference DNA sequence are highly correlated and can be used to better model the true variation in the target genome. A novel short-read micro re-aligner, SRMA, that leverages this correlation to better resolve a consensus of the underlying DNA sequence of the targeted genome is described here

    Elucidation of bioinformatic-guided high-prospect drug repositioning candidates for DMD via Swanson linking of target-focused latent knowledge from text-mined categorical metadata

    Get PDF
    Duchenne Muscular Dystrophy (DMD)’s complex multi-system pathophysiology, coupled with the cost-prohibitive logistics of multi-year drug screening and follow-up, has hampered the pursuit of new therapeutic approaches. Here we conducted a systematic historical and text mining-based pilot feasibility study to explore the potential of established or previously tested drugs as prospective DMD therapeutic agents. Our approach utilized a Swanson linking-inspired method to uncover meaningful yet largely hidden deep semantic connections between pharmacologically significant DMD targets and drugs developed for unrelated diseases. Specifically, we focused on molecular target-based MeSH terms and categories as high-yield bioinformatic proxies, effectively tagging relevant literature with categorical metadata. To identify promising leads, we comprehensively assembled published reports from 2011 and sampling from subsequent years. We then determined the earliest year when distinct MeSH terms or category labels of the relevant cellular target were referenced in conjunction with the drug, as well as when the pertinent target itself was first conclusively identified as holding therapeutic value for DMD. By comparing the earliest year when the drug was identifiable as a DMD treatment candidate with that of the first actual report confirming this, we computed an Index of Delayed Discovery (IDD), which serves as a metric of Swanson-linked latent knowledge. Using these findings, we identified data from previously unlinked articles subsetted via MeSH-derived Swanson linking or from target classes within the DrugBank repository. This enabled us to identify new but untested high-prospect small-molecule candidates that are of particular interest in repurposing for DMD and warrant further investigations

    SeqWare Query Engine: storing and searching sequence data in the cloud

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands.</p> <p>Results</p> <p>In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (<url>http://seqware.sourceforge.net</url>).</p> <p>Conclusions</p> <p>The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.</p

    Expression profile of CREB knockdown in myeloid leukemia cells.

    Get PDF
    BackgroundThe cAMP Response Element Binding Protein, CREB, is a transcription factor that regulates cell proliferation, differentiation, and survival in several model systems, including neuronal and hematopoietic cells. We demonstrated that CREB is overexpressed in acute myeloid and leukemia cells compared to normal hematopoietic stem cells. CREB knockdown inhibits leukemic cell proliferation in vitro and in vivo, but does not affect long-term hematopoietic reconstitution.MethodsTo understand downstream pathways regulating CREB, we performed expression profiling with RNA from the K562 myeloid leukemia cell line transduced with CREB shRNA.ResultsBy combining our expression data from CREB knockdown cells with prior ChIP data on CREB binding we were able to identify a list of putative CREB regulated genes. We performed extensive analyses on the top genes in this list as high confidence CREB targets. We found that this list is enriched for genes involved in cancer, and unexpectedly, highly enriched for histone genes. Furthermore, histone genes regulated by CREB were more likely to be specifically expressed in hematopoietic lineages. Decreased expression of specific histone genes was validated in K562, TF-1, and primary AML cells transduced with CREB shRNA.ConclusionWe have identified a high confidence list of CREB targets in K562 cells. These genes allow us to begin to understand the mechanisms by which CREB contributes to acute leukemia. We speculate that regulation of histone genes may play an important role by possibly altering the regulation of DNA replication during the cell cycle

    A Path to Implement Precision Child Health Cardiovascular Medicine.

    Get PDF
    Congenital heart defects (CHDs) affect approximately 1% of live births and are a major source of childhood morbidity and mortality even in countries with advanced healthcare systems. Along with phenotypic heterogeneity, the underlying etiology of CHDs is multifactorial, involving genetic, epigenetic, and/or environmental contributors. Clear dissection of the underlying mechanism is a powerful step to establish individualized therapies. However, the majority of CHDs are yet to be clearly diagnosed for the underlying genetic and environmental factors, and even less with effective therapies. Although the survival rate for CHDs is steadily improving, there is still a significant unmet need for refining diagnostic precision and establishing targeted therapies to optimize life quality and to minimize future complications. In particular, proper identification of disease associated genetic variants in humans has been challenging, and this greatly impedes our ability to delineate gene-environment interactions that contribute to the pathogenesis of CHDs. Implementing a systematic multileveled approach can establish a continuum from phenotypic characterization in the clinic to molecular dissection using combined next-generation sequencing platforms and validation studies in suitable models at the bench. Key elements necessary to advance the field are: first, proper delineation of the phenotypic spectrum of CHDs; second, defining the molecular genotype/phenotype by combining whole-exome sequencing and transcriptome analysis; third, integration of phenotypic, genotypic, and molecular datasets to identify molecular network contributing to CHDs; fourth, generation of relevant disease models and multileveled experimental investigations. In order to achieve all these goals, access to high-quality biological specimens from well-defined patient cohorts is a crucial step. Therefore, establishing a CHD BioCore is an essential infrastructure and a critical step on the path toward precision child health cardiovascular medicine

    Celsius: a community resource for Affymetrix microarray data

    Get PDF
    Celsius is a new system that serves as a warehouse by aggregating Affymetrix files and associated metadata, and containing the largest publicly available source of Affymetrix microarray data

    Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks

    Get PDF
    BACKGROUND: Genes and proteins are organized into functional modular networks in which the network context of a gene or protein has implications for cellular function. Highly connected hub proteins, largely responsible for maintaining network connectivity, have been found to be much more likely to be essential for yeast survival. RESULTS: Here we investigate the properties of weighted gene co-expression networks formed from multiple microarray datasets. The constructed networks approximate scale-free topology, but this is not universal across all datasets. We show strong positive correlations between gene connectivity within the whole network and gene essentiality as well as gene sequence conservation. We demonstrate the preservation of a modular structure of the networks formed, and demonstrate that, within some of these modules, it is possible to observe a strong correlation between connectivity and essentiality or between connectivity and conservation within the modules particularly within modules containing larger numbers of essential genes. CONCLUSION: Application of these techniques can allow a finer scale prediction of relative gene importance for a particular process within a group of similarly expressed genes

    SeqWare Query Engine: storing and searching sequence data in the cloud

    Get PDF
    Abstract Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets
    • …
    corecore